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A Diagnostic Classification Model 
For Document Processing Skills 

Abstract 

This paper introduces a modification to the Rule Space diagnostic classification 
procedure which allows for processing of response vectors containing missing data. Rule 
Space is an approach to diagnostic classification which involves characterizing examinees 9 
performances in terms of an underlying cognitive model of generalized problem-solving skills. 
It has two components: (1) a procedure for determining a comprehensive set of knowledge 
states, where each state is characterized in terms of a unique subset of mastered skills; and (2) 
a procedure for classifying examinees into one or another of the specified states* The 
procedure for determining a comprehensive set of knowledge states is based on the Boolean 
descriptive function given in Tatsuoka (1991). The procedure for classifying examinees 
involves comparing examinees' scored response vectors to the patterns expected within each 
of the specified knowledge states (Tatsuoka, 1983, 1985, and 1987). Missing data is expected 
to be a common problem for this approach because, although the procedure for determining 
the comprehensive set of knowledge states requires a large pool of items, the procedure for 
examinee classification can be performed with smaller (less expensive) item subsets. This 
approach to diagnostic classification is illustrated with data collected in the Survey of Young 
Adult Literacy, a nationwide survey of literacy skills conducted by the National Assessment 
of Educational Progress (NAEP) in 1985. 
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A Diagnostic Classification Model 
For Document Processing Skfcis 



Many procedures for diagnostic classification require specification of the universe of 
procedural bugs accounting for examinees 9 errors* Diagnostic classification is subsequently 
performed by comparing an examinee's observed performance on a representative set of items 
to the performances expected under each of the specified Juggy procedures. When a good 
match is found, the examinee is classified as having that particular bug. 

For problems of typical size and complexity, however, the bug enumeration approach 
may not be feasible. An alternative, less fine-grained approach to diagnostic classification 
involves characterizing examinee?' performances in terms of an underlying cognitive model of 
generalized problem-solving skills. Examinees' observed performances can then be compared 
to the performances expected at different mastery levels defined with respect to the 
underlying skills. Thus, the problem of enumerating all possible buggy procedures is replaced 
by two new problems: (1) identifying the unobservable, cognitive skills underlying 
performance, and (2) translating these skills into a comprehensive set of diagnostically 
relevant knowledge states. These two new problems may be more amenable to solution, 
especially in situations where a cognitive theory of performance is already available. 

In this paper we assume that the cognitive skills underlying performance have already 
been identified and describe (1) a procedure for determining a comprehensive set of 
diagnostically relevant knowledge states; and (2) a procedure for classifying examinees' 
observed response vectors into one or another of the specified knowledge states. The 
procedure for determining a comprehensive set of knowledge states is based on the Boolean 
descriptive function given in Tatsuoka (1991). The examinee classification procedure is 
a modification of the Rule Space classification procedure which allows for processing of 
response vectors containing missing data. Missing data is expected to be a common problem 
for these procedures because the method for determining a comprehensive set of knowledge 
states is defined with respect to a specific item pool. As will be seen later, this encourages 
the use of large diverse item pools for knowledge state definition and smaller (less expensive) 
item subsets for examinee classification. 

This new approach to diagnostic classification is described in the following sections. 
The procedure for determining a comprehensive set of knowledge states is presented first 
Second, the Rule Space classification procedure is described. Third, differences between this 
approach and an approach based on latent class analysis are presented. Fourth, modifications 
to the Rule Space classification procedure which were developed to handle the expected 
missing data problem are described. Fifth, this approach is applied to the problem of 
diagnosing document procesing skills. The data available for the application were collected 
in the Survey of Young Adult Literacy, a nationwide survey of literacy skills conducted by 
the National Assessment of Educational Progress (NAEP) in 1985. The unobservable 
ordinally-scaled variables assumed to be underlying performance on document processing 
tasks were derived from the work of Kirsch and Mosenthal (1990) who identified features of 
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the items which were found to be highly correlated with proficiency in the domain. Finally, 
two new methods for analyzing the classification results are presented. 

Determining a Comprehensive Set of Knowledge States 

The process of determining a comprehensive set of knowledge states in a domain of 
interest begins with the specification of the elementary cognitive skills needed for mastery of 
the domain. In Birenbaum, Kelly and Tatsuoka (1992), for example, proficiency in the 
domain of elementary algebra is broken down into a set of 11 component skills including: 
(1) ability to apply the distributive law; (2) ability to apply arithmetic order of operations 
laws; (4) ability to recognize when it makes sense to subtract a term from both sides of an 
equation; and (5) ability to recognize when it makes sense to divide both sides of an equation 
by the coefficient of x. (For a list of the remaining seven skills, see Birenbaum et al., 1992.) 
Thus, although proficiency in solving elementary algebra problems is generally thought of as 
a unidimensional trait, a significant proportion of the variation in that trait may be accounted 
for by a diverse set of more elementary skills. 

Note that the elementary algebraic skills listed above are all reported in a 
dichotomized fashion. Also, they are all diagnostically relevant in the sense that knowledge 
of the subset of skills possessed by an examinee constitutes information which one would 
expect to find useful for remediation. These two characteristics of skills (i.e. ability to 
dichotomize and relevance to remediation) are required for successful application of the 
diagnostic classification procedures described below. 

Once the elementary cognitive skills underlying proficiency in the domain of interest 
have been identified, a comprehensive set of latent cognitive states can be determined by 
listing all possible subsets of skills mastered. For example, consider a model consisting of 
two skills A, and A 2 . The set of all possible subsets of these skills consists of the following 
four elements: 

1. ) The examinee has mastered both A, and A 2 . 

2. ) The examinee has mastered A, but has not mastered A 2 . 

3. ) The examinee has mastered A 2 but has not mastered A,. 

4. ) The examinee has not mastered A, or A 2 . 

Thus, the universe of all possible latent cognitive states can be specified in terms of a set of 
four states. Due to the combinatorial navore of this problem, however, this method of 
determining the universe of latent cognitive slates will not always be feasible. In the 
document processing illustration presented below, for example, the cognitive model yielded a 
total of 22 skills. The corresponding set of all possible subsets of skills mastered would 
include 2 M » 4.2 X 10 6 elements, too many to consider, much less enumerate. 
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An alternative procedure for specifying the universe of all possible latent cognitive 
states is described in Tatsuoka (1991). (In Tatsuoka, the elementary cognitive skills are 
termed attributes. In this paper, the terms attribute and elementary cognitive skill are used 
interchangeably.) In this alternative procedure, characteristics of the available item pool are 
exploited to select a subset of states for further consideration. This is accomplished in two 
steps. First, in a step inspired by the work of Scheiblechner (1972) and Fischer (1973), each 
item in the pool is classified as to the subset of skills required for successful completion. 
This classification must be performed by someone who is familiar both with the items and 
with the cognitive model proposed for solving the items. The result is an incidence matrix Q 
whose order is the number of attributes (K) by the number of items (n). If item j requires 
mastery of skill k then Q^— 1, otherwise Q^=0. Second, a Boolean descriptive function 
(BDF) is used to extract only those combinations of attributes which are represented in the 
available item pool. For example, consider a model involving ten attributes, A t through A lQ . 
If every item that required mastery of A 10 also required mastery of A$ then all states 
combining mastery of A i0 with nonmastery of A* would be excluded from the set of selected 
states (regardless of the mastery status specified for the remaining eight attributes). 

As this example shows, states that are psychologically and logically valid but not 
distinguishable from the available item pool would not be extracted by the BDF. Thus, this 
procedure encourages the use of a large diverse item pool. For best results, the pool should 
contain at least one item tapping each expected combination of skills. Note that the BDF 
only requires that the items be classified according to required attributes. Thus, a 
comprehensive set of knowledge states can be determined without actually administering all 
of the items in the pool. 



Classifying Observed Response Patterns 

The classification procedure described here involves comparing examinees' scored 
response patterns, (Xp[Xj t ,...,x J, where x^ is the response of the ith examinee to the jth 
item, 1 if correct, 0 if incorrect, and n is die number of items in the entire item pool) to the 
patterns expected within each of the specified knowledge states. First, each state is 
characterized by an ideal item response vector indicating the subset of items that would be 
successfully solved by an examinee in that state (XrfXsi»"*tXiJ» s=l,...,S). The process of 
associating an ideal item response vector with a particular state is fairly straightforward: 
when the incidence matrix indicates that a particular item requires a particular combination of 
attributes, the ideal response to that item will be correct for all states having that combination 
of attributes and incorrect for all others. Once an ideal response pattern has been defined for 
each state, the Rule Space classification procedure (Tatsuoka, 1985, 1987) can be used to 
classify examinees' observed response patterns as indicating the pattern of attribute mastery 
associated with one or another of the specified cognitive states. 

A unique feature of the Rule Space classification procedure is that the comparison of 
examinees' observed response patterns to the various ideal response patterns is performed in a 



reduced space that has only two dimensions. These two dimensions were selected to capture 
variation in the response patterns that would be considered important from the vantage point 
of Item Response Theory (CRT). The first dimension corresponds to the IRT proficiency 

estimate 6 . (Hereafter, 6 will be written as 6 for simplicity.) This dimension is important 

because it describes variation in the response patterns that can be attributed to differences in 
examinee proficiency levels. The second dimension corresponds to the variable £ which is an 
index of how unusual a particular item response pattern is (Tatsuoka, 1984, 1985). The £ 
associated with a particular response vector Xj is calculated as follows 



wheze £(6^) » g (P,<e,> -x 1;> ) (PAOi) -r^)) , 

and r(0 4 ) = 1 . 

« jsl 

In the above equations, is the probability of a correct response to the j* item by the i* 
examinee (as determined from the assumed IRT model), and T(8j) is the average probability 
of a correct response, calculated over all items. Note that P(6j)-Xi measures the deviation 
of the item response vector Xj from its expected value P(6i), and P(9 i )-T(e l ) measures the 
deviation of the expected value of the response vector Xj from the overall average probability 
of a correct response at 6,. 

To illustrate the importance of £ in comparing different item response patterns, Table 
1 lists sample £ values for a five-item test calibrated under the Rasch model with difficulty 
parameters of -2, -1, 0, 1 and 2. Each of the patterns listed in the table corresponds to a 
number correct score of 3, and thus, has an associated IRT proficiency estimate of 9=.51. 
The table shows two things: first, the £ variable has been successful at capturing variation in 
the response patterns which was not captured by the proficiency estimate 8; and second, the C 
values can be used to order the response patterns from those conforming to a Guttman pattern 
(C=-.85) to those conforming to a reverse Guttman pattern (£=6.10). Thus, another way to 
think about £ is that it indicates how well respondents' patterns accord with the assumed IRT 
model; low values indicate good fit (signaled by a Guttman pattern) and high values indicate 
poor fit (signaled by a reverse Guttman pattern). 

Tatsuoka (1983) has noted that "similar" response patterns will have similar values of 
9 and C- Thus, one can evaluate the "similarity" of response patterns by mapping them into 
the two dimensional space formed by the Cartesian product of 8 and £. This space is termed 
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the rule space. We note here that the mapping from response pattern to £ will only be one- 
to-one under certain conditions. (Dibello & Baillie, 1991). However, a one-to-one mapping 
can be assumed for most Rule Space applications because the conditions under which the 
mapping will not be one-to-one, as derived in Dibello & Baillie, will rarely be found among 
data which fit an IRT model. 



Insert Table 1 Here 



After the ideal item response vectors associated with each of the possible latent 
cognitive states have been mapped onto the two-dimensional rule space, determination of skill 
mastery for a particular examinee can proceed according to the following steps. First, the 
examinee's observed item response vector is also projected onto the two-dimensional rule 
space. Second, a subset of admissible states is determined by applying an admissibility 
criterion to each possible state. The admissibility criterion is defined in terms of the 
Mahalanobis distance (DJ) between the examinee's point in the rule space i=l,...JN) and 
the points associated with each of the ideal item response vectors (X,, s=l,...S). In particular, 
State s is admissible if 

0i, 2 <X«,«> 

where D i& 2 = (6,-6 ,) 2 J<6,) ♦ (Ci-C.) 2 . 

and 1(8.) is the Fisher information associated with the estimate 6 $ and xVa) is * e <*- 
quantile of a chi-square random variable with 2 degrees of freedom. (We also say that State 
s is contained in the examinee's admissibility region.) Thus, an examinee's admissibility 
region contains the subset of states whose ideal item response vectors most closely resemble 
the examinee's observed item response vector, as determined by the Mahalanobis distance 
criterion. 

Let r be a state in the admissibility region determined for examinee i. The posterior 
probability that this examinee has the pattern of skill mastery associated with State r can be 
determined as follows 



6 

u 



where P(r) and P(s) represent prior probabilities for states r and s (s=l,.„,S) respectively. The 
conditional probability, PCO^IO^Q is taken to be bivariate normal with mean 



and variance-covariance matrix 



1 



At this point, two alternative methods for determining attribute mastery classifications 
are available* First, in a manner similar to a latent class analysis, one could select the best 
available description of the examinee's true mastery profile by selecting that state with the 
highest posterior probability* For example, if State r had the highest posterior probability of 
all the states in the examinee's admissibility region, then the examinee would be classified 
into State r, or in other words, he or she would be diagnosed as having the pattern of attribute 
mastery associated with State r. Alternatively, it may be more appropriate to estimate an 
attribute mastery vector for each examinee by taking a weighted average of the attribute 
mastery designations associated with each of the states in the admissibility region. As an 
example, consider an admissibility region consisting of two states with the following attribute 
mastery patterns: {State n 100} and {State q: 110}. A weighted average of these mastery 
designations would provide the following vector of attribute mastery values: 

P(A!>= 1.0 

PCA^ = m%W(x\%0 + P(AQ] 
P(A 3 ) = 0.0 

where PCriO^Q and PCqlO^Q represent posterior probabilities for States r and q, 
respectively. Note that, in this alternative method, an examinee's mastery status is described 
probabilistically rather than absolutely. This alternative method may be more or less 
appropriate depending on the ways in which the classification results are to be used. 



Comparison to Latent Class Analysis 

Since latent class analysis also has as its objective the classification of observed 
response vectors into one or more of a set of latent cognitive states where each state is 
characterized by an idealized pattern of correct and incorrect resposes (Lazersfeld and Henry, 
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1960; Goodman, 1974; Macready and Dayton, 1980) it is useful to examine the differences 
between these two approaches. 

A unique feature of the latent class approach is that each latent cognitive state is 
additionally characterized by a set of conditional probabilities, a* and {3 r The probability a* 
is the conditional probability of a correct response to any item for which the idealized pattern 
X, indicates a correct response, given that the examinee has the pattern of skill mastery 
associated with State s. Similarly, {3 S is the conditional probability of a correct response to 
any item for which the idealized pattern X, indicates an incorrect response, given that the 
examinee has the pattern of skill mastery associated with State s. From specified values of a* 
and it is possible to calculate p,(Xj), the posterior probability that an examinee belongs to 
latent class s, i.e. has the pattern of skill mastery associated with State s, given their observed 
pattern of correct and incorrect responses, Xj. Diagnostic classification can then be 
performed by classifying each examinee into the class with the highest posterior probability. 
Note that the Rule Space approach does not require the specification of conditional 
probabilities a* of (5 S . 

A second difference between the Rule Space approach and a latent class approach is 
that the latent class approach provides very little guidance in the specification of knowledge 
states. By contrast, in the Rule Space approach, the comprehensive set of knowledge states 
is completely determined by the specification of the underlying cognitive model and the 
characteristics of the available item pool If the item pool is developed to contain items 
tapping each of the relevant cognitive skills, then all of the relevant knowledge states will be 
extracted. 

A third way in which the current approach differs from a latent class approach is that 
the current approach provides detailed information about which skills the examinee has and 
has not mastered. By contrast, the latent class approach merely provides information about 
which state the examinee has been classified into. Since states are not necessarily broken 
down into their more elementary cognitive components, the link to an effective remediation 
strategy is not as direct 

The Missing Data Modification 

In the classification procedure outlined above, each examinee's observed item response 
vector is compared to a single set of ideal item response vectors. Thus, it is assumed that 
each examinee is presented the same subset of items. In some testing situations, however, it 
will not be possible to administer the entire item pool to each examinee. In many large-scale 
testing programs, for example, multiple-matrix item sampling designs are used to efficiently 
measure population characteristics from sparse matrix samples of item responses. (Mislevy, 
Beaton, Kaplan and Sheehan, 1992). In these designs, different subsets of items are presented 
to different subsets of examinees. The NAEP data analyzed below provides an example. 
These data were collected under an item sampling design, called balanced incomplete block 
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(BIB) spiralling, in which the item pool is first divided into blocks and subsets of blocks are 
then grouped into test booklets such that each pair of blocks appears together in exactly one 
booklet This design violates the assumption of no missing data since each examinee is only 
presented a subset of the entire item pool This section describes a modification to the rule 
space procedure which was developed to allow processing of data sets containing missing 
item responses* Note that the procedure for determining a comprehensive set of knowledge 
states is not affected by this modification, since that procedure requires only the incidence 
matrix, not examinee response vectors. 

To allow for different patterns of missing responses among different examinees, the 
missing data modification described here has been tailored to match the particular set of items 
presented to an examinee. That is, only those items which were actually administered to an 
examinee are considered during the classification of that particular examinee* This is 
accomplished in two steps: first, all 'not presented 9 items are masked out of the examinee's 
observed item response vector, and second, these same items are masked out of each states's 
ideal item response vector Classification decisions are then made by comparing the 
examinee's reduced item response vector to each of the states 9 reduced ideal item response 
vectors. That is, both the examinee's reduced item response vector and each of the reduced 
ideal item response vectors are projected into the two-dimensional rule space and the Bayes 
decision rule described previously is applied. Note that this modification involves a great 
deal of additional computation since the ideal item reponse vectors associated with each state 
must be projected into the rule space N times, once for each examinee. By contrast, in the 
original rule space procedure the ideal item response vectors are projected into the rulespace 
once and this single projection is assumed to serve for all examinees. 

Note that this approach does not involve any assumptions about the examinee's 
probable responses to missing items. Rather, a masking procedure is used to remove not- 
presented items from consideration entirely. An unintended result of the masking of ideal 
item response vectors is that two or more states may then be projected onto identical points in 
the rule space. When this occurs, it is an indication that the sampling design had not allowed 
for testing of all relevant attributes. To illustrate this point, consider a five-item test in which 
each item tests mastery of a single attribute. Two possible ideal item response vectors for 
this test are listed below. 

o Ideal response pattern for State r: 10100 
o Ideal response pattern for State q: 10101 . 

Since States r and q differ only in their response to item 5, the reduced ideal item response 
vectors associated with these two states will be indistinguishable with respect to any item 
subset which does not include item 5. Thus, under the tailored classification procedure 
described above, some examinees may be classified as belonging either to State r or to State 
q with no way of distinguishing between the two. Two methods for dealing with this 
problem are proposed. Both methods involve first applying the modified classification 
procedure described above, and then applying an additional selection criterion only if the 
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examinee has been classified as belonging to two or more states that are indistinguishable 
with respect to the subset of items administered 

The first method proposed for dealing with the problem of indistinguishable states 
(such as States r and q above, if Item 5 were not administered) is appropriate when the 
primary purpose of the diagnostic prosedure is to select a remediation program for the 
examinee* Under this method, the examinee is assigned to one or another of the possible 
states by selecting that state which indicates the least number of attributes mastered. In the 
example Usted above, the examinee would be classified into State r. Note that this method 
assumes that the loss of providing remediation when remediation is not required is less than 
the loss of failing to remediate when remediation is required 

The second method proposed for dealing with the problem of indistinguishable states 
is appropriate when remediation is not the primary concern or when the losses associated with 
the two types of remediation errors ate assumed to be equal. In this method, final 
classification decisions are made by comparing the prior probabilities associated with each of 
the possible states. In the example listed above, the examinee would be classified into State r 
or State q depending on which had the higher prior probability. The rationale for using prior 
probabilities to compare states derives from the result that, conditional on a previous 
classification to a cluster of indistinguishable states, the posterior probabilities of all states in 
that cluster are proportional to their prior probabilities. A proof of this result is given in 
Appendix A. 



An Application to the Domain of Document Literacy 

The procedures outlined above have been applied to the document literacy data 
collected in the Survey of Young Adult Literacy, a nation-wide survey of literacy skills 
conducted by NAEP in 1985. This dataset includes 61 items classified as measuring 
document literacy, that is, the knowledge and skills needed to process information stored in 
non-prose formats such as tables, charts, or schedules (Kirsch and Jungeblut, 1986). These 
items were administered by trained interviewers: the examinee was handed a document, such 
as a page from a phone book or bus schedule, and was then asked to respond to one or two 
questions which required processing of at least some of the information stored in the 
document The cognitive model assumed to be underlying performance in this domain was 
adapted from the work of Kirsch and Mosenthal (1990) who identified features of the items 
which were later shown to be highly correlated with the IRT difficulty parameters of the 
items (Sheehan and Mislevy, 1990). 

The item feature variables identified by Kirsch and Mosenthal are listed in Table 2. 
These variables were originally measured on an ordinal scale. We have translated them into a 
set of 22 dichotomously scored attributes by coding the incidence matrix as indicated in Table 
2. To illustrate this procedure, consider the coding listed for the Degree of Correspondence 
variable. This variable measures the degree to which the phrasing in the stem portion of the 
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item matches the phrasing in the document which the item refers to. It is scored on a 1 to 5 
scale with lower values indicating more direct correspondence and thus, less difficulty; and 
higher values indicating less direct correspondence and thus, more difficulty. The first three 
ordered levels were translated into a set of three dichotomously scored attributes as follows: if 
an item is classified as requiring level 1 correspondence skills then an examinee would have 
to have mastered attribute CI in order to correctly solve that item; if an item is classified as 
requiring level 2 skills then an examinee would have to have mastered attributes CI and C2 
in order to correctly solve that item; if an item is classified as requiring level 3 skills then an 
examinee would have to have mastered attributes CI, C2 and C3 in order to correctly solve 
that item. Levels 4 and 5 are translated analogously. Thus, the order relationships inherent 
in the ordinal levels of the original variables have been translated into order relationships 
among the attributes through the coding of the incidence matrix. 



Insert Table 2 Here 



Note that, under this coding scheme, it is impossible for an examinee to have mastered 
attribute C5 without also having mastered attributes CI through C4. Similar restrictions apply 
to the other attributes. Thus, the attributes are now hierarchically ordered. This hierarchical 
ordering of the attributes is responsible for reducing the number of valid states from 2 n to 
7,776 or 6X6X3X3X6X4. The final number of valid states is much lower, however, 
since the item pool does not test all hierarchically-valid combinations of the attributes. That 
is, in the particular item pool developed for the NAEP literacy survey, items requiring 
medium to high mastery levels on some cognitive variables tended to also require medium to 
high mastery levels on other cognitive variables. Similarly, items requiring medium to low 
mastery levels on some cognitive variables tended to also require medium to low mastery 
levels on other cognitive variables. Since most combinations were not represented in the item 
pool (for example, Correspondence at Level 1 and Distractor at Level 5), the procedure for 
determining the subset of latent cognitive states to be considered found only 157 valid states. 

The nationally representative adult literacy sample included approximately 3,600 
scientifically selected examinees in the 21 to 25 age group. The subset of itrms presented to 
each examinee was determined through a BIB item sampling design in which the item pool 
was first divided into seven nonoverlapping blocks, and subsets consisting of three different 
blocks were subsequently arranged into seven distinct booklets such that each pair of blocks 
appeared together in exactly one booklet The booklets were then spiralled into the 
population so that each booklet was administered to a random subsample of approximately 
500 examinees. Because the original blocks differed in the number of document items they 
contained, the number of items in the resulting booklets also differed: from a low of 19 to a 
high of 41. These data were modeled using a two parameter logistic IRT model. Although 
item parameters were estimated using all of the available data, only those booklets which 
contained 30 or more items were included in the subset of data used to develop the diagnostic 
model. Booklets containing fewer than 30 items were excluded because 8 estimates based on 
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fewer than 30 items were considered to be too imprecise for use in classification. The final 
sample included three booklets, or three random subsamples containing a total of 1*509 
examinees. 

The projection of examinee response vectors into the two-dimensional rule space is 
presented in Figure 1. Examinees' 0 values arc plotted along the x-axis, examinees 9 £ values 
are plotted along the y-axis. The plot shows a scatter of points in the 6 range from -3 to 3 
and the £ range from -3 to 3. Figure 2 provides the projection of the 157 latent cognitive 
states into the rule space. As can be seen, there are very few states in the high 0 region. 
Thus, we should not expect to find high classification rates among high proficiency 
examinees. Figure 3 shows the prior probabilities assumed for each state. Prior probabilities 
were assumed to be proportional to the height of the bivariate normal density with mean (0,0) 
and covariance matrix equal to the identity. This prior was selected because (a) item 
parameters were estimated under the constraint of a standardized population distribution of 6; 
and (b) since £ is defined in standardized form, it is also expected to have a mean of zero 
and a standard deviation of one, whenever the IRT model fits. 



Insert Figures 1, 2 and 3 Here 



Using the procedure described previously (with an a-level of .10), an admissibility 
region was determined for each examinee. A Bayes decision rule was then used to classify 
examinees into their "most possible" state. The classification results are summarized by 
classification outcome category in Table 3. The results show that 40% of the examinees were 
classified into a unique state, an additional 33% were classified into a set of two 
indistinguishable states, an additional 13% were classified into a set of three indistinguishable 
states, and so on. Overall, 90-percent of the examinees were classified into one or more of 
the 157 states. The fact that large numbers of examinees were not classified into a unique 
state indicates that the subset of items administered to each examinee did not test all of the 
relevant skills. This problem can be ameliorated in future document literacy assessments by 
specifying skill coverage as one of the characteristics to be considered in defining item 
subsets. 



Insert Table 3 Here 



Table 3 also lists the average number of items completed by an examinee in each 
classification outcome category. These values show that the probability of being classified 
into a unique state increases with the number of items completed. Note however that the 147 
examinees who were not classified also completed a large number of items. This indicates 
that the classification failure was not due to insufficient data, but rather, to the fact that these 
examinees were responding in ways which were not consistent with the assumed cognitive 
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model. Thus, the cognitive model accounts for the document processing behaviors of only 
90-percent of the population. 

The number and percent of classified examinees is summarized by proficiency group 
and gender in Table 4. The low, medium and high proficiency groups were defined by 
dividing the original data set into thirds according to examinee's estimated 6 values. Thus, 
the 503 examinees with the lowest 6 values were classified into the low proficiency group, 
the 503 examinees with the highest 6 values were classified into the high proficiency group, 
and the remaining examinees were classified into the medium proficiency group. The table 
shows that the model works best for low proficiency examinees (95% classified) as opposed 
to medium or high proficiency examinees (88% classified). The breakdown by gender shows 
that females are more likely than males to be classified (93% as opposed to 87%). 



insert Table 4 Here 



Analysis of Attribute Mastery Probabilities 

A vector of attribute mastery probabilities can be estimated for each classified 
examinee. For those examinees who were classified into a unique state (as was the case for 
600 examinees in our sample) the probability of mastering any particular attribute will be 
either zero or one, depending on whether that attribute was included in the subset of attributes 
mastered defined for that state. (Note that we are ignoring the issue of classification error 
here. That issue is treated briefly at the end of this section.) When an examinee has been 
classified as belonging to a subset of two or more indistinguishable states, then the 
examinee's vector of attribute mastery probabilities can be determined by taking a weighted 
average of the attribute mastery probabilities defined for each state in the subset Weights are 
selected to be proportional to the states' prior probabilities since, as was described previously, 
the posterior probability of each state in the subset is proportional to its prior probability. To 
illustrate this calculation, consider a cognitive model consisting of three attributes {A„ A 2 , 
A,}, and an examinee who has been classified as belonging either to State r or to State q, 
where States r and q have the following subsets of attributes mastered: {State r. A,}, and 
{State q: A t , A 2 }. The vector of attribute mastery probabilities for this examinee is 
calculated as follows: 

P(A,) = 1.0 

p(A 2 ) = P(q)/[P(r) + P(q)] 
p(A 3 ) = 0.0 
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where P(r) and P(q) represent prior probabilities for States r and q, respectively. Note that 
this procedure does not require us to select a unique "best" state for the examinee. 

This method of calculating attribute mastery probabilities was applied to each of the 
1,362 examinees who were classified in this study. The resulting attribute mastery 
probabilities were classified by proficiency group and gender and then analysed using a 
multivariate repeated measures analysis of variance, as described for instance in Myers 
(1979). A standard analysis of variance would not have been appropriate for these data 
because the hypothesis of multisample shericity is violated. The results of this analysis are 
summarized in Table 5. (For reasons described below, the results given in Table 5 are based 
on IS rafter that 22 attributes.) 



Insert Table 5 Here 



The analysis of variance results reported in Table 5 provide evidence of three 
significant effects: proficiency group, attributes, and the attribute by proficiency group 
interaction. These results indicate that the attributes are differentially difficult and that 
examinees in different proficiency groups tend to have different attribute mastery profiles. 
The nonsignificance of the gender effects is interesting because it indicates that, for each 
attribute analysed, the average probability of mastery values calculated for males and females 
were very similar. Thus, the data provide no evidence of a gender difference in mastery of 
elementary document processing skills. 

Table 6 presents the mean probability of mastery values estimated for each attribute. 
The different attribute mastery profiles obtained for low, medium and high proficiency 
examinees are clearly illustrated. The differential difficulty of the attributes is also shown. 
Note that, for each variable, the lowest classification level is mastered with a probability of 
1.0 by examinees in all three proficiency groups. Thus, there is strong justification for 
excluding level 1 items from future document literacy assessments. Another thing to note is 
that attributes C3 and C4 have equal attribute mastery values in all three proficiency groups. 
This result is due to the fact that the item pool did not contain any items classified as level 3 
on the correspondence variable. Thus, the probabilities listed for attribute C3 are no more 
than an artifact of the coding scheme developed for the incidence matrix. Because we have 
no valid information about mastery probabilities for attribute C3, and because we know for 
sure that all examinees have mastered attributes CI, Dl, II, 01, SI, and Tl, these seven 
attributes were not included in the analysis of variance described previously. 



Insert Table 6 Here 
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The last column in Table 6 provides the mean probability of mastery values estimated 
for the total sample of examinees* These values were obtained by taking an unweighted 
average of the mean values estimated in each of the three proficiency groups* Differences in 
these means were investigated using the multiple pairwise comparisons procedure described in 
Keselman, Keselman and Shaffer (1991)* This procedure is appropriate because it uses 
estimates of variance for each comparison that are unbiased under violation of multisample 
sphericity* Using an overall a-level of .05, four clusters of similarly difficult attributes were 
identified: {C5, D5}, {S3, D3, C2}, {D3, C2, 02}, {C2, 02, 14} and {13, T2, 12, S2}. One 
thing to note about these clusters is that, except for 13 and 12, different levels of the same 
variable never appear together in the same cluster. Thus, for most variables, collapsing of 
levels is not indicated* 

An alternative procedure for determining attribute mastery probabilities involves taking 
a weighted average of the attribute mastery designations defined for each state in the 
examinee's admissibility region* Although this alternative procedure was not used in this 
paper, we wish to note that it allows for an explicit treatment of classification error since 
weights may be defined to be proportional to states' posterior probabilities. 



A Tree Representation 
of the Classification Results 

Often, diagnostic classification models are used to route examinees through 
computerized instructional systems* To assist in that purpose, this section presents a tree 
representation of the classification results obtained in this study. 

The first step in devising a tree representation for a set of classification results 
involves selecting a single "best" state for each examinee who was classified into a subset of 
two or more states which were found to be indistinguishable with respect to the subset of 
items administered* As indicated earlier, this can be done by assigning examinees to states 
based on a loss function approach or by comparing states 9 prior probabilities. Because the 
primary purpose of the tree representation is to assist in routing examinees through 
computerized instructional systems, the loss function approach is the natural choice. This 
approach was applied to the document literacy classification results by assigning examinees to 
states such that the resulting classification indicated the least number of attributes mastered 

After all examinees have been assigned to their single "best" state, a subset of states 
which accounts for a large portion of the classified examinees must be determined. The 
subset of states selected for the document literacy tree representation consisted of all states 
with an observed frequency of seven or more examinees. This subset included 30 states and 
accounted for 92% of the classified examinees. The states included in this subset are listed in 
Table 7. The table also provides the attribute mastery designations for each state. As 
expected, states with high 6 values tend to have lots of mastered attributes and states with 
low 0 values tend to have fewer mastered attributes. The column of state frequencies shows 
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that this subset of states accounts for a total of 1,249 examinees, or 83% of the original 
sample. 



Insert Table 7 Here 



To develop a tree representation of the data given in Table 7, we start by plotting each 
state as a node and then draw arcs from one node to another, or from one state to another, to 
indicate transition relationships among the states. A transition from one state to another is 
said to be possible whenever the set of attributes associated with the first state is the largest 
available subset of the set of attributes associated with the second state. Thus, arcs connect 
lower states to higher states, where a higher state is defined as a state having at least one 
more attribute mastered. In some instances, of course, the next higher state will have two or 
more additional attributes mastered. The tree representation of the document processing 
classification results is given in Figure 4. 



Insert Figure 4 Here 



The node labels in Figure 4 identify the subset of attributes which would not be 
mastered by an examinee in the corresponding knowledge state. Thus, an examinee who is 
classified as having mastered all attributes except Correspondence Level 5 and Distractor 
Levels 4 and 5 would be assigned to the node labeled "C5.D4". The alternative remediation 
strategies available for this examinee are indicated by the two paths from node "C5JD4" to the 
state of perfect knowledge (represented by the blank node at the top of the figure). Path 1 
progresses from "C5 J>4" to "C5" and then to the blank node; Path 2 progresses from 
"C^JM" to "D4", then to "D5" and then to the blank node. Path 1 corresponds to a 
remediation strategy in which the two distractor attributes are remediated first; Path 2 
corresponds to a remediation strategy in which the correspondence attribute is remediated 
first One way to choose between these two alternative remediation strategies is to consider 
the frequency values listed in Table 7. Path 1 has a frequency of 7 (7 examinees located at 
node "C5"); Path 2 has a frequency of 83 (59 examinees located at node "D4" and an 
additional 24 examinees located at node "D5"). Thus it is much more likely for an examinee 
to have mastered attribute "C5" before having mastered attributes "D4" and "D5" than the 
other way around. This suggests that a remediation strategy based on Path 2 has a higher 
probability of success than one based on Path 1. 
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Discussion 



This paper has shown that the Rule Space approach to diagnostic classification can be 
satisfactorily applied to data sets containing large amounts of missing data. With respect to 
the analysis of the NAEP document literacy data, there are three major findings to report: 

(1) For 40% of examinees, the Rule Space approach provided a precise diagnostic 
classification. That is, it indicated the particular subset of elementary document processing 
skills mastered by each examinee. 

(2) For an additional 33% of examinees, information about skill mastery was narrowed down 
to a set of two indistinguishable states. By comparing the attribute response vectors 
associated with each of these states, it would be possible to identify, for each examinee, the 
subset of skills known to be mastered, the subset skills known not to be mastered, and the 
subset of skills with mastery status still in question. A subsequent test could then be tailored 
to test only those skills which were still in question. 

(3) The data provide no evidence of a gender difference in mastery of elementary document 
processing skills. 

In closing, we wish to note that two aspects of the document literacy application were 
somewhat atypical. First, all of the attributes were hierarchically ordered. Although the 
hierarchical ordering of attributes was responsible for a large reduction in the number of valid 
knowledge states, it was not necessary for application of the Rule Space approach. The only 
characteristics of attributes which are required for application of these procedures are: (1) they 
must be readily dichotomized and (2) they must be diagnostically relevant. Hierarchical 
ordering of the attributes will only come into play when the original variables are expressed 
on an ordinal or an interval scale. 

The document literacy application was also atypical is that the problem of 
indistinguishable states was so pronounced. We wish to emphasize that the missing data 
would not have lead to so many indistinguishable states if the cognitive characteristics of the 
items had been considered during the process of constructing item subsets. 
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Table 1 



Sample £ Values 
For Response Patterns with a Number-Correct Score of 3 

from a Five-Item Test 
With Rasch Item Difficulty Parameters of -2, -1, 0, 1, 2 



Item Response Pattern' 



I - 2 


-I 


0 


1 


2 


c 


1 


1 


1 


0 


0 


-.85 


1 


i 


0 


1 


0 


.96 


1 


0 


1 


1 


0 


1.98 


1 


i 


0 


0 


1 


2.00 


0 


i 


1 


1 


0 


2.24 


1 


0 


1 


0 


1 

A 


3.02 


0 


1 


1 


0 


1 


3.27 


1 


0 


0 


1 


1 


4.83 


0 


1 


0 


1 


1 


5.09 


0 


0 


1 


1 


1 


6.10 




(a) All patterns yield 8= 


.51. 
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Table 2 



The Document Literacy Variables & Attributes 







IvOWS vAXJCu 1 




Attribute 


in the 


Variable Name / Level Description 1 


Name 


Inc. Matrix 


Degree of Correspondence between phrasing in the question 




or directive ana in tnc oocumeni. 






1) literal correspondence 




1 


<ivnonvmon<i enrre^nondftneft 

4m J w Tllvll T lilvllv Wl I vw VvllUVllv V 




1,2 


3) arrived at via low text-based inference 


C3 


1,2,3 


4) arrived at via high text-based inference 


C4 


1 9 ^ A 


5) requires special prior knowledge 


C5 


1,2,3,4,5 


Type of Information processing required to 






identify and match features: 






1) make a literal feature match 


11 


6 


2) make a low text-based inference 


12 


6,7 


3) make a high text-based inference 


13 


6,7,8 


4) make several conditional matches across nodes 


14 


6,7,8,9 


5) use special prior knowledge 


15 


6 7 8 9.10 


No. of Organizing Categories (OCs) in the Directive: 






1) 1 or less 


Ol 


11 


2) 2 or more 


02 


11,12 


No. of Specifics in the Directive: 






1) 2 or less 


Tl 


13 I 


2) 3 or more 


T2 


13,14 | 


Plausibility of Distractors: 






1) no distractors 


Dl 


15, 1 


2) in same OC but do not share critical features 


D2 


15,16 I 


3) in same OC and do share critical features 


D3 


15,16,17 D 


4) appear in different OCs, at same level 


D4 


15,16,17,18 I 


5) appear in different OCs, at different levels 


D5 


15,16,17,18,19 1 


No. of Specifics in the Document: 






1) SO or less 


SI 


20 


2) between 51 and 100, inclusive 


S2 


20,21 


3) greater than 100 


S3 


20,21,22 



1. For complete level descriptions see Kirsch and Morenthal (1990). 
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Tabic 3 



The Initial Classification Results 
By Classification Outcome Category 
And Average Number of Items Completed 



No. 


No. 




Avg. 


Cum. 




of 


of 




No. 


No. 


Cum. 1 


States 


Subjects 


% 


Items 


Subjs 




1 


600 


40 


36.2 


600 


40 I 


2 


494 


33 


36.8 


1094 


73 


3 


203 


13 


33.2 


1297 


86 


4 


26 


2 


32.5 


1323 


88 


>=5 


39 


3 


25.7 


1362 


90 


Not Class. 


147 


10 


37.1 


1509 


100 



No. of States = No. of states located at the selected point in the Rule Space. 
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Tabic 4 



The Number and Percent of Classified Examinees 
By Proficiency Group and Gender 





1 UUU 


Mn 


r vivviu 




Subjects 


Classified 


Classified 


Proficiency Group 








Low 


503 


476 


95 


Medium 


503 


443 


88 


High 


503 


443 


88 


Gender Group 








Female 


845 


787 


93 


Male 


664 


575 


87 


All Subjects 


1509 


1362 


90 
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Table 5 



Analysis of Variance Results 





Num. 


Den. 






Effect 


DF 


DF 


F Value* 


Pr>F 



Between Subjects 
Proficiency 
Gender 
Prof X Gen 

Within Subjects 
Attributes 
Att. X Prof. 
Att X Gender 
Att X P X G 



1356 



655.44 



1 


1356 


0.02 


2 


1356 


0.64 


14 


1343 


1868.55 


28 


2686 


99.42 


14 


1343 


0.75 


28 


2686 


1.00 



.0001 
.8842 
.5295 



.0000 
.0000 
.7280 



(a) F values for within subject effects were calculated using Wilk's 
Lambda. 
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Table 6 



Mean Attribute Mastery Probabilities 







Proficiency 






Att 


LUW 


JVlvU 


Hieh 


Total 


CI 


LOO 


L00 


1.00 


1.00 


C2 


0*68 


0.87 


LOO 


0.85 








0 48 

u»*to 


0 36 


P4 


U*^x 


O T7 


0 48 


0 36 




0 01 


O 15 

U»x«/ 


0 26 


0 14 


Dl 


1.00 


1.00 


1.00 


1.00 


D2 


1.00 


1.00 


1.00 


LOO 1 




0 70 


0 85 


1 00 

X ♦ V/V 


0 85 8 




0.31 


0.33 


0.65 


0.43 I 


I D5 


0.13 


0.16 


0.24 


0.17 1 


H ii 


1 00 

x«w 


1 00 

x« w 


1.00 


1.00 | 


1 12 


0.94 


1.00 


1.00 


0.98 | 


I 13 


0.91 


1.00 


1.00 


0.97 I 


14 


0.68 


0.98 


1.00 


0.88 


15 


0.22 


0.72 


0.96 


0.64 


Ol 


1.00 


1.00 


1.00 


1.00 


02 


0 72 


0 89 


1.00 


0.87 


SI 


1.00 


1.00 


1.00 


1.00 


S2 


0.95 


1.00 


1.00 


0.98 


S3 


0.56 


0.90 


1.00 


0.82 


Tl 


1.00 


1.00 


1.00 


1.00 


T2 


0.91 


1.00 


1.00 


0.97 


All 


0.64 


0.75 


0.82 


0.74 J 
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Table 7 

The Thirty Most Frequent States Ordered by 6 



9 Freq. Attributes Mastered 

3.05 31 CCCCC IIIII 00 TT DDDDD SSS 

1.72 24 CCCCC IIIII 00 TT DDDD- SSS 

1.28 7 CCCC- IIIII 00 TT DDDDD SSS 

1.11 59 CCCCC IIIII 00 TT DDD — SSS 

0.81 42 CC~- IIIII 00 TT DDDDD SSS 

.70 38 CCCC- IIIII 00 TT DDD— SSS 

.62 102 CC IIIII 00 TT DDDD- SSS 

.39 296 CC IIIII 00 TT DDD— SSS 

.33 18 CC— IIII- OO TT DDDD- SSS 

.29 8 CCCC- IIII- 00 TT DDD— SSS 

I .13 35 CC— IIII- OO TT DDD— SSS 

I -.23 64 CCCCC IIIII 00 TT DD— SSS 

I -.29 12 CC— III— 00 TT DDD— SSS 

I -.50 19 C IIIII 00 TT DDDDD SSS 

I -.51 23 CCCC- IIIII 0- TT DDDDD SSS 

-.53 8 CC IIII- 00 TT DD SSS 

-.59 10 C IIIII 00 TT DDDD- SSS 

-.60 57 CCCC- IIIII 0- TT DDD— SSS 

-.63 14 C IIIII 00 TT DDD— SSS 

-.67 57 CC— IIII- 00 TT DDDD- SS- 

-.67 42 CC--- IIIII 0- TT DDDDD SSS 

-.74 35 C IIII- 00 TT DDDD- SSS 

-.75 74 CC— IIIII 00 TT DDD— SS- 

-.78 23 C IIII- 00 TT DDD— SSS 

-.92 38 C III— 00 TT DDD-- SSS 

-1.06 13 CC— IIII- 00 TT DD— SS- 

-1.18 45 CC— - III— 00 TT DD SS- 

-1.22 38 C III-- 00 TT DD SSS 

-1.61 9 CC--- I 0- TT DD SS- 

-2.03 8 CC I 0- T- DD SS- 

1249 



26 

31 



Appendix A 



Proof that, Conditional on a Prior Classification to a Cluster of Indistinguishable States, 
the Posterior Probabilities of all States in the Cluster are Proportional to their Prior 
Probabilities. 



Let r and q be two states which are indistinguishable with respect to the subset of items 
administered Let s represent the union of r and q. Let X represent an examinee's vector of 
observed item responses. (The number of elements in X will be less than the total number of 
items in the pool). Since r and q are indistinguishable we have 

P{X\r) « PU|cr) « P{X\s) 

The posterior probability of state r, conditional on a prior classification to state s, is calculated 
as 



pfrl« r) « P(r and s\X) 



P{r\X) 

Tip" 

PiX\x)P{x) 
P{X\s) Pis) 

PU) 



Pis) 

Similarly, the posterior probability of state q, conditional on a prior classification to state s, is 

««•■* - US- 

Thus, conditional on a prior classification to a cluster of indistinguishable states, the posterior 
probability of any state in the cluster is proportional to its prior probability. 
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FIGURE CAPTIONS 
Figure 1 . Projection of Examinee Response Data into the Rule Space. 
Figure 2 . Projection of the 157 states into the Rule Space. 
Figure 3 . Prior probabilities for the 157 states. 
Figure 4 . A Tree Representation of the classification results. 



28 

33 



Figure 1 

PROJECTION OF EXAMINEE RESPONSE DATA 
INTO THE RULE SPACE 




THETA 
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Figure 2 



PROJECTION OF THE 157 STATES 
INTO THE RULE SPACE 




THETA 
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Figure 3 



PRIOR PROBABILITIES 
FOR THE 157 STATES 



PRIOR 
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